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Abstract 

Background: The Tcl/mariner superfamily of transposable elements (TEs) is widespread in animal genomes. 
Mariner-Wke elements, which bear a DDD triad catalytic motif, have been identified in a wide range of flowering 
plant species. However, as the founding member of the superfamily, 7c1 -like elements that bear a DD34E triad 
catalytic motif are only known to unikonts (animals, fungi, and Entamoeba). 

Results: Here we report the identification of Tel -like elements (TLEs) in plant genomes. These elements bear the 
four terminal nucleotides and the characteristic DD34E triad motif of 7c 1 element. The two TLE families [PpTc], 
PpTcl) identified in the moss {Physcomitrella patens) genome contain highly similar copies. Multiple copies of Pplc\ 
are actively transcribed and the transcripts encode intact full length transposase coding sequences. TLEs are also 
found in angiosperm genome sequence databases of rice {Oryza sativa), dwarf birch {Betula nana), cabbage 
{Brassica rapa), hemp {Cannabis sativa), barley {Hordium valgare), lettuce {Lactuta sativa), poplar {Populus trichocarpa), 
pear {Pyrus x bretschneideri), and wheat {Triticum urartu). 

Conclusions: This study extends the occurrence of TLEs to the plant phylum. The elements in the moss genome 
have amplified recently and may still be capable of transposition. The TLEs are also present in angiosperm 
genomes, but apparently much less abundant than in moss. 

Keywords: Transposable elements, Moss, Tc1-mariner-IS630 superfamily, 7c1 -like elements, Mariner-Wke elements, 
Plant genome, Evolution, Transposition activity 



Background 

Transposable elements (TEs) are a major component of 
most eukaryotic genomes. Their transposition in genomes 
may lead to increase in their copy numbers. TEs are clas- 
sified into two categories (Class I and Class II) based on 
their mechanism for transposition. Class II elements are 
DNA transposons that adopt a cut-and-paste' approach 
catalyzed by enzymes called transposases. The elements of 
this class are further divided into superfamilies based on 
different types of transposases. All of the transposases of 
these elements bear a DDE/D triad motif, however, differ- 
ent superfamilies have distinct transposases and structural 
features such as the length of the duplicated target site se- 
quences [1,2]. Despite the growing number of reported ac- 
tive TEs, the majority of transposable elements are not 
active [3,4]. These elements are important for the dynamic 
structure of genome during evolution [5,6]. The immobi- 
lized TEs can serve as raw genetic materials for genome 
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tinkering [7-15]. Autonomous TEs encode and produce 
transposases for their mobilization. Non-autonomous ele- 
ments have lost their ability to encode functional transpo- 
sases and rely on other sources of transposases for 
transposition. An ultimate group of non-autonomous ele- 
ments is miniature inverted-repeat transposable elements 
(MI TEs). They are short elements and have high copy 
numbers [16-18]. 

Tcl-mariner-IS630 is a Class II TE superfamily first iden- 
tified in nematode and insect genomes [19]. The superfam- 
ily was named after Tel in Caenorhabditis elegans [20], 
and mariner in Drosophila mauritiana [21]. This super- 
family is characterized by two terminal inverted repeats 
(TIRs) of typically 12 to 28 nt flanked by dinucleotide tar- 
get site duplications (TSDs) of 'TA\ The transposases of 
this superfamily contain a triad catalytic motif consisted of 
two aspartic acid (D) residues and a glutamate residue (E) 
in Tcl-like elements (TLEs) or aspartic acid (DDD) in 
Mariner-like elements (MLEs) and pogo-like elements 
[22,23]. The pocket formed by these residues contains the 
metal ions needed in the DNA cleavage reaction during 
transposition [24]. Based on the number of residues 
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between the second and third catalytic residues of the 
DDE/D motif, Tel /mariner catalytic domains can be 
DD34E, DD34D, DD31-33D, DD35E, DD37D, DD37E, or 
DD39D, each defining a subgroup of the Tel /mariner 
superfamily [18,22,25-27]. Tel/mariner elements have been 
considered to be confined to animals until the recent iden- 
tification of DD39D mariner-like elements and pogoAike 
elements in plants [18,22,23]. Tcl-like elements are the 
founding subgroup of the Tel/mariner superfamily and 
they bear the DD34E triad catalytic motif [20]. Previous 
studies have identified TLEs in a variety of animals and 
fungi [23] as well as in the parasitic amoebozoa Entamoeba 
invadens [28]. However, to the best of our knowledge, 
there has been no report of TLEs outside the unikonts 
(animal, fungi, and amboebozoa) [29]. Previous studies 
have identified TLEs in a number of animal or fungal 
genomes, some have been demonstrated to be active, in- 
cluding Tel and Tc3 in C. elegans [20,30,31], Minos in 
Drosophila hydei [32], and Impala in fungus Fusarium oxy- 
sporum [33,34]. The reconstructed fish element Sleeping 
Beauty is also a TLE [35]. Tel -like elements named 
Hydargo have been identified in Entamoeba parasites [28]. 

Here we report the identification of TLEs in plants. 
The two families of full-length TLEs in the moss (Phys- 
comitrella patens) genome have multiple copies that 
contain an intact open reading frame (ORF). These 
ORFs are actively transcribed and presumably also trans- 
lated into functional transposases in moss. TLEs were 
also found in the genome sequence databases of angio- 
sperm plants. 

Results 

Tel -like elements in moss 

Mariner-like: elements are widespread in plant genomes 
[18,36]. To investigate whether plant genomes contain 
TLEs, moss genome sequence databases were screened 
because mosses are among the first terrestrial plants. 
When the sequence of Tel transposase was used as the 
query sequence for BLAST search against the moss (Phys- 
comitrella patens) genome database that has a coverage of 
approximately 8.6X [37], 118 high scoring hits (e- value: 
<e-8) were obtained. Close inspection of the output re- 
vealed two groups of elements that have complete ter- 
minal inverted repeats (TIRs) with terminal 5'-CAGT ... 
ACTG-3' sequences flanked by TSDs of dinucleotide 'TA\ 
Both groups of elements contain open reading frames for 
transposases bearing a DD34E motif. These characteristics 
suggest that these two groups are TLEs and were desig- 
nated as PpTcl and PpTc2. Neither of the two families has 
been previously described or annotated [37]. No similar 
elements or their transposase sequences were found in the 
genome of the spike moss Selaginella moellendorffii. 

The full-length PpTcl elements are 1,584 bp long 
with TIRs of 33 bp. It has an ORF of 338 aa with two 



helix-turn-helix domains and a catalytic DD34E domain 
(Figure 1). A total of 85 copies were retrieved from the 
P. patens genome sequence database. Among them, 75 
were full length bearing the intact ends with average se- 
quence identity of 96.3%, and 52 of which were highly 
similar copies with >98% sequence identity, but there 
were no identical copies. Nine copies were found to 
carry an intact full-length ORF (338 aa). To gain insights 
into the insertion sites of PpTcl elements, it is important 
to inspect the sequences homologous to the flanking se- 
quences of PpTcl insertion sites. Such sequences that 
do not bear the TE insertions are called related empty 
sites (RESs). The sequence signatures of the TE insertion 
sites on RESs may reflect historical transposition events. 
Among the 75 full length copies, RESs can be found 
for the flanking sequences of 42 copies with 14 of them 
in AT rich simple repeat flanking sequences (Additional 
file 1: Figure SI). Most of the 28 RESs that are not AT- 
rich simple repeats correspond to the sequences before 
insertion of elements, some (for example, that of scaffold 
54) may have resulted from excision events and subse- 
quent repairing. 

The full-length PpTc2 elements are 1,709 bp long, with 
TIRs of 33 bp (Figure 1). A total of 22 copies of PpTc2 
were retrieved from the genome database. The 20 full- 
length copies have an average sequence identity of 
96.6%. PpTc2 has eight copies bearing a full-length intact 
338aa ORF. Among the 20 full-length copies, RESs can 
be found for the flanking sequences of three copies 
(Additional file 1: Figure SI). While the RES of scaffold 
10 clearly represents a site before insertion of an elem- 
ent, that of scaffold 136 may have resulted from excision 
events and subsequent repairing of the excision sites. 
Interestingly, insertion of the PpTdl in scaffold 281 is ac- 
companied by a duplication of a microsatellite unit at 
the insertion site. These RESs of PpTc insertions sites 
demonstrated the genomic changes caused by the activ- 
ity of these elements during evolution. 

Comparison of PpTcl and PpTc2 

The history of activities of these elements in the genome 
is an important part of the evolution of these elements. 
According to the molecular clock theory, the mutations 
accumulated in each copy of an element in a TE family 
can be used to infer the time of divergence from their 
ancestral element [38]. The sequences of the ancestral 
element of a TE family may be approximated to the con- 
sensus sequences of the TE family. Therefore, the ele- 
ments produced at the same time frame can be expected 
to have similar levels of sequence divergence from the 
ancestral element. Based on the consensus sequences of 
PpTcl and PpTc% the average sequence divergence score 
was calculated for each copy and the number of ele- 
ments in a certain range of sequence divergence value 
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PpTcl TA 




HTH1 




C AGTG ACAAflC AAAACCGAGT AC AAAATCT GAA 



C AGTGGGGT AC AGAAAT AAT T OG AAT T TT T TT C 



T T C AG ATT T TGT ACT OG ATT T TGT TT GTC ACT G 



AAAAAAACT TCG AAT TAT TT CTGT ACCCC ACT G 



PpTc2 T.aJ| HTH1 [ HTH2 

1 479 DDE 

Figure 1 7c7-like elements in the moss genome. Schematics of PpTcl and PpTcl element structures. Black triangles, TIRs; regions in green or 
red, non-coding sequences; regions in yellow or brown, open reading frames; HTH, helix-turn-helix DNA binding motif; DDE, catalytic DD34E 
triad motif. 



was plotted against the sequence divergence range. The 
PpTcl family has an average divergence value of 2.18 ± 
0.08% with a significant peak at 1.5% sequence diver- 
gence (Figure 2), suggesting a recent burst of amplifica- 
tion events of this family occurred about 1.5 million 
years ago and the rate of amplification has since de- 
creased according to a rate of 1% sequence divergence 
per million years. The PpTc2 family have an average se- 
quence divergence value of 2.17 ± 0.20% with the most 
recent peak at about 1%, suggesting that PpTc2, similar 
to PpTcl, recently amplified about 1 million years ago. 
Interestingly, the PpTdl dynamics is similar to the cycles 
of TE amplification described previously [39]. 



35 




0.06 

Divergence from Consensus 

Figure 2 Sequence divergence of full-length elements of PpTcl 
and PpTcl. Y-axis, number of elements; x-axis, level of sequence 
divergence from the consensus sequence of PpTcl or PpTcl family. 



Although PpTcl and PpTc2 bear identical extreme ter- 
minal sequences 'CAGT (Figure 1), their internal regions 
do not bear detectable DNA sequence similarities. Even 
the transposase coding sequences do not share significant 
sequence similarities between the two elements. When 
the putative peptide sequences of the two transposases 
were aligned, they share 26% (89/338) sequence identify 
with 47% positive (161/338) (Figure 3A). These results 
suggest that the two elements shared a very distant com- 
mon ancestor. However, the very similar intra-family se- 
quence divergence levels of the two families suggest that 
they may have invaded and amplified in the moss genome 
at a similar time during evolution. 

Since the crystal structures of Mosl and the DNA 
binding domain of Tc3 were determined, the transposase 
structures of PpTcl and PpTc2 can be predicted based 
on these templates [24,40]. Using Phyre2 web server, the 
transposase structure of Mosl was used by the algorithm 
to model the transposases. The homologous models 
have 100% confidence with about 95% coverage of the 
query sequences, suggesting highly similar protein struc- 
tures between these two proteins and to the Mosl trans- 
posase (Figure 3B). Based on the structural features of 
Mosl, similar features were predicted on the models of 
PpTcl and PpTcl transposases. These models provide 
important starting information to understand the func- 
tionality of these transposases and their structural and 
functional deviations from other transposases in the 
Tcl/mariner superfamily. 

Expression of PpTcl in moss 

The high intra-family sequence similarity in PpTcl and 
PpTc2 and the presence of multiple copies of elements 
that contain intact transposase coding sequences indi- 
cate that they are potentially active. Expression of trans- 
posase is required for transposition activity, therefore it 
is important to determine whether PpTcl and PpTc2 are 
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Figure 3 Comparison of the putative transposases of PpTcl and PpTcl. (A) Alignment of peptide sequences. Colored residues: blue to cyan, 
a-helices of HTH motifs; green to yellow, DD34E triad motif. (B) Predicted three-dimensional ribbon models of transposases. Blue to red, N 
terminus to C terminus; HTH1 and HTH2, putative DNA binding (both) and dimerization (HTH1 only); clamp, loop structure potentially interacts 
with the linker of the other monomer in a transposase dimer; linker, potentially interacts with the clamp loop of the other monomer in a dimer; 
DD34E, catalytic active center. 



actively transcribed. Extensive sequencing of the moss 
transcriptome has been previously performed and reported 
[41]. The expressed sequence tags derived from protone- 
mal tissue and gametophores have been analyzed exten- 
sively and resulted in an assembled transcript database 
Pp0409 that contains 47,557 entries (www.cosmoss.org). 
Expressed sequence tag coverage of the genome assem- 
bly is 98% [37]. PpTc elements and the CDS of moss 
actinl gene (PpActl) were used to retrieve assembled 
transcripts from the database. Compared to the 17 tran- 
scripts from PpActl, 68 assembled transcripts containing 
the nucleotide sequences of the ORF region were re- 
trieved for PpTcl and no transcript for PpTc% suggesting 
that the level of transcripts of PpTcl in moss cells is 
higher than the constitutive gene actinl. Each of these 
transcripts corresponds to a specific copy of PpTcl elem- 
ent. Nine of the PpTcl transcripts can be conceptually 
translated into a full-length intact transposase (Figure 4, 
Additional file 1: Table SI). Each of these transcripts bear- 
ing intact ORFs is derived from a specific copy of the nine 
genomic copies of PpTcl bearing intact transposase cod- 
ing sequences, suggesting that these elements are actively 
transcribed and yielded mature mRNA. The fact that no 
identical copies of PpTcl were present in the genomic se- 
quence database suggests an attenuated transposition ac- 
tivity after the peak amplification of the family around 1.5 
million years ago. Since TE transcripts can be degraded by 
siRNA and their translation may be blocked by micro- 
RNAs, the PpTcl transcripts were used to search against 
the small RNA databases [42-45]. However, no small RNA 
matching the coding sequences of PpTcl transposase gene 
were retrieved, suggesting that the PpTcl mRNAs are not 



degraded or their translation blocked, therefore may be 
translated into transposase proteins. Because of the abun- 
dance of the transcripts of the transposase gene, it is pos- 
sible that a post-translational mechanism such as over 
production inhibition demonstrated for animal Tcl/mari- 
ner elements may have led to the repression of its trans- 
position [46,47]. When PpTc2 sequences were used to 
search against the assembled transcript database, no tran- 
scripts were retrieved. This suggests that the expression of 
the transposase genes of this family is probably repressed 
at the transcriptional levels. 

Evolutionary relationship of transposases encoded by 
moss TLEs to those of animal and fungal TLEs 

Since TLEs have been previously described only in animal 
and fungal genomes, the relationship of the moss TLEs 
to other TLEs will help to understand the propagation 
of TLEs in plant genomes. Even though there are only a 
few well characterized TLEs in literature, recent progress 
in whole genome sequencing produced TLE sequences 
in many different genomes. Using well characterized 
TLE transposase sequences including Tel (X01005), Tc3 
(P34257.1), Minos (CAP09075.1), and Impala (AF282722), 
together with PpTcl and PpTc% we retrieved representa- 
tive TLE sequences in different genomes from the non- 
redundant protein database of Genbank. The majority of 
these sequences were not classified therefore named as 
hypothetical proteins or unknown proteins. Notably, the 
TLE element in Rhizopus delemar was found to have at 
least 60 copies. After removal of redundancy of sequences 
belonging to the same family, together with PpTc ele- 
ments, the sequences were aligned with the previously 



Liu and Yang Mobile DNA 2014, 5:17 
http://www.mobilednajournal.eom/content/5/1/17 



Page 5 of 10 



PpActl-CDS 



PpTcl 



PpTc2 





Figure 4 Transcripts from PpTc elements. Thick lines on top, query sequences; solid thin lines, matched regions between the queries and hits 
in the transcript database; dotted lines, unmatched regions reflecting intronic regions; the coding DNA sequence (CDS) of moss octin] gene was 
used as a control. 
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Figure 5 Phylogenetic relationship of transpoases of moss TLEs to those of animal and fungal TLEs. Names, species followed by Gl 
numbers of each sequence; numbers on branches, percentage of bootstrap value of 1,000 reiterations. 
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described TLEs and a phylogenetic tree was constructed 
(Figure 5). Similar to that reported previously, the 
branches on the phylogenetic tree of these elements have 
relatively low bootstrap values (98% to 62%) [48]. Never- 
theless, the topology of the previously analyzed elements 
such as 2c 1, Tc3, Impala, and Minos is consistent with that 
shown in the previous report. Impala appeared to have 
branched off early from the rest of the TLEs. The rest of 
elements are grouped into two clades: Tel clade and Tc3 
clade. The majority of these elements belong to the Tel 
clade. The fact that the phylogenetic relationship among 
these elements is clearly incongruent with that of their 
host species may suggest ancestral polymorphism or long 
branch attraction [49], alternatively horizontal transfer of 
these elements among eukaryotic species may have also 
contributed to the observation [50,51]. The two moss ele- 
ments belong to different clades with PpTcl in the Tel 
and PpTc2 in the Tc3 clade, further suggesting that these 
two elements may have different origins. 

TLEs in angiosperm genome sequence databases 

To determine whether TLEs have proliferated throughout 
plant genomes, the predicted transposase sequences of 
PpTcl and PpTc2 were used as query sequences to search 
against all other plant genomic sequences in the GenBank 

Table 1 Plant Tel -like transposases described in this study 



Element 


Organism 


Accession 


Plant 






PpTcl 


Physcomitrella patens 


ABEU01 007491 


PpTc2 


Physcomitrella patens 


ABEU01 006878 


OsTd 


Oryza sativa Indica 


AAAA02041396 


BnTd 


Betula nana 


CAOK01056615 


BnTc2 


Betula nana 


CAOK01 550459 


BnTc3 


Betula nana 


CAOK01 014729 


BnTc4 


Betula nana 


CAOK01486111 


BrTd 


Brassica rapa 


AENI01 020305 


BrTc2 


Brassica rapa 


AENI01 036930 


CsTd 


Cannabis sativa 


AGQN01 308320 


HvTd 


Hordium valgare 


CAJV01 0227559 


HvTc2 


Hordium valgare 


CAJV01 0272453 


HvTc3 


Hordium valgare 


CAJV0 12609061 


HvTc4 


Hordium valgare 


CAJV01 1622646 


LsTd 


Lactuta sativa 


AFSA01 593962 


LsTc2 


Lactuta sativa 


AFSA01 593962 


PtTd 


Populus trichocarpa 


AARH01 030986 


PxbTd 


Pyrus x bretschneideri 


AJSU01 007483 


PxbTc2 


Pyrus x bretschneideri 


AJSU01 007483 


TuTd 


Triticum urartu 


AOTI01 0070343 



All TLE elements bear the D34E of the DD34E catalytic motif. 



WGS and NR/NT databases using TBLASTN. Segments 
of Tel -like transposase coding sequences were identified 
in nine angiosperm genomes including rice (Oryza sativa), 
dwarf birch (Betula nana), cabbage (Brassica rapa), hemp 
(Cannabis sativa), barley (Hordium valgare), lettuce (Lac- 
tuta sativa), poplar (Populus Trichocarpa), pear (Pyrus x 
bretschneideri), and wheat (Triticum urartu) (Table 1). 
The conserved regions including at least the second 
(aspartic acid) and the third (glutamic acid) residues of 
the DD34E catalytic motif were retrieved. Most of these 
elements are single copies and they are not uniform in 
size. While TLE in the database of Oryza sativa is a 
complete element with intact terminal sequences, the ma- 
jority of the plant TLEs are fragmented and do not encode 
a complete transposase. When the regions between the 
second D and the E residues of the DD34E motifs were 
aligned, conserved motifs surrounding these two residues 
were revealed (Figure 6 A and Additional file 1: Figure S2). 
The conserved motifs surrounding the E residues of 
these TLEs are apparently different from those surround- 
ing the corresponding D residue of the MLEs such as 
Mosl (X78906), Soymarl (AF078934.1), and OsmarS 
(ACV32571.1). Among the sequenced plant genomes, the 
distribution of the species containing TLEs is apparently 
patchy (Figure 7). These results suggest that TLEs are also 
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Figure 6 Sequence alignment of the catalytic motifs of transposases (A) and end sequences (B) of plant TLEs. (A) The regions containing 
the DDE/D catalytic motifs of the transposase sequences. Plant MLEs are shown at the bottom. (B) The terminal sequences of plant TLEs and 7c1. 
The degree of background shading indicates different levels of conservation of sequences. Asterisks indicate elements that only have one end 
present in a genomic contig. Abbreviation for species names: 0s, Oryza sativa; Bn, Betula nana; Br, Brassica rapa); Cs, Cannabis sativa; Hv, Hordium 
valgare; Ls, Lactuta sativa; Pt, Populus trichocarpa; Pb, Pyrus x bretschneideri; Tu, Triticum urartu. 



present in angiosperm genomes, but are much less abun- 
dant than in the moss genome. 

Discussion 

The identification of TLEs in plant genomes expanded 
our knowledge on the distribution and diversity of Tel/ 
mariner elements. Elements belonging to the mariner- 
subgroup have been found to be widespread in plant 
genomes [18]. TLEs, however, have not been previously 
reported in plants. In fact, PpTcl and PpTc2 are the first 
Tel/mariner elements described in moss. They not only 
expand the range of distribution of TLEs into plants, but 
also provide information for the development of TE- 
based tools for gene discovery in moss in the future. 

PpTc elements have undergone a recent wave of prolif- 
eration. The results suggest that their transposition activ- 
ities appear to have subsequently been contained in the 
current moss genome. Although most copies of PpTc ele- 
ments have lost the capacity to encode a functional trans- 
posase due to mutations that interrupt the transposase 



coding sequences, both families have members bearing 
full length intact transposase-coding genes and PpTcl ele- 
ments are actively expressed in moss. These observations 
indicate that, even though the transposition activity of 
PpTcl may have been attenuated, it may still be modestly 
active. In addition, since the genome was sequenced with 
shot gun approaches, the reads for these repetitive se- 
quences may have been misassembled. Therefore, it is 
possible that identical PpTcl sequences are present in the 
genome. The absence of transcripts from PpTdl may indi- 
cate a high level of repression of transposition. It remains 
mysterious how these elements are repressed. It is possible 
that, under certain environmental factors, these elements 
may become fully active in transposition. Alternatively, 
the activities of these elements may be restricted to certain 
tissues/organs or specific temporal stages during the life 
cycle of the plant. Further investigation on the repression 
of the transposition activities of both families will facilitate 
our understanding of the interaction between TEs and 
their host genomes. 
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Figure 7 Patchy distribution of plant species containing TLEs. Based on the cladogram of sequenced plant genomes (up to April 2013) 
generated by James Schnable at CoGe (http://genomevolution.org) and used with permission. Black, published genomes; Gray, unfinished 
genomes; Green highlight, species containing TLEs. 



Conclusions 

TLEs are present in plant genomes. The two families of 
TLEs in the moss genome have recently amplified 1 to 2 
million years ago. These families contain elements that are 
potentially capable of transposition but their transposition 
activities appear to have been attenuated. TLEs were also 
identified in the genome databases of angiosperm plants, 
suggesting their distribution in multiple plant orders. The 
results presented in this report further our understanding 
of the evolutionary history of Tel I mariner elements and 
provide important information for future investigations 
into the interaction between TEs and host genomes. 

Methods 

Retrieval of moss Tel -like elements 

To identify transposons related to Icl-like elements, the 
Tel transposase peptide sequence was used as the query 
sequence to search against GenBank databases of P. patens 
genome with the default parameters. Each returned hit was 
retrieved and inspected for TIRs. Complete elements were 
searched against its host genome to obtain the members of 
its family. Nucleotide sequences of full-length TLE copies 
were retrieved with MITE Analysis Kit function MEMBER 
(http://labs.csb.utoronto.ca/yang/MAK/) [52,53]. Members 
of each family were retrieved with MAK with zero toler- 
ance for end mismatches. 



Characterization of moss TLEs 

Alignments of all retrieved members in each PpTc family 
were obtained with CLUSTAL, and a consensus sequence 
was generated. The elements were conceptually translated 
and scanned for long ORFs with the APE program (http:// 
biologylabs.utah.edu/jorgensen/wayned/ape/). HTH motifs 
were predicted with NPS webserver (http://npsa-pbil.ibcp. 
fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_hth.html) and 
the conserved domain database at NCBI. The putative 
models of PpTcl and PpTc2 were predicted with Phyre2 
(http://www.sbg.bio.ic.ac.uk/phyre2/). Sequence alignments 
were performed with MUSCLE at the EBI webserver 
(http://www.ebi.ac.uk/Tools/msa/muscle/) and the phylogen- 
etic tree was constructed with Phylogy.fr (www.phylogeny.fr) 
with 1,000 bootstrap reiterations. 

Sequence divergence of PpTcl and PpTcl families 

To calculate the average sequence divergence of a family, 
the consensus sequence of each family was constructed. 
The consensus sequence was used as the input for the Di- 
vergence function of MAK. Each divergence value is the 
complementary percentage of the similarity value in the 
pairwise alignment of a copy and the consensus sequence. 
The output contains the sequence divergence values for 
each member. The average divergence for each family 
was calculated. To plot the number of elements against 
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divergence, values of individual divergence were grouped 
into bins of 0.5% and the number of elements in each bin 
was counted. The overall sequence similarity for a family 
is calculated as the complement of the average sequence 
divergence. 

Expression analysis of PpTc families 

Moss TLEs PpTcl and PpTc2 (ABEU01007491 and 
ABEU01006878, respectively) were used to search 
against the assembled transcripts database Pp0409 on 
the moss genome browser (http://www.cosmoss.org/) 
[41]. Returned hits were inspected for a long ORF that 
encodes a transposase bearing a DD34E catalytic motif. 
The loci of transcripts were cross-referenced to the nu- 
cleotide BLAST hits to remove redundancy. The se- 
quences were also used to search for moss small RNA 
databases [42-45]. 

Analyses of TLEs in other plant genome databases 

Plant genome databases WGS and NR/NT were 
searched at NCBI using TBLASTN with the peptide se- 
quences of the putative transposases of PpTcl and 
PpTc2. Hits and their flanking sequences were retrieved 
to identify putative transposase or TIR sequences. 

Additional file 



Additional file 1: Table SI. Transcripts of PpTcl that produce a 
conceptual full-length DD34E transposase. Figure SI. Related empty sites 
(RESs) for moss TLEs. Figure S2. Sequence alignment of plant TLEs and 
other Tcl/mariner representative elements using all predicted peptide 
sequences. 
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